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ABSTRACT 


The  PARAFRASE  database  is  a  relational  database.  It  is  im- 
plemented using  an  interface  to  an  existing  network  database  sys- 
tem that  is  available  on  the  Prime.  This  manual  describes  Primos 
commands  needed  to  execute  the  query  processor,  manipulate  output 
files,  and  save/rerun  query  programs.  The  query  language,  con- 
sisting of  an  output  specification  and  a  Boolean  expression,  is 
described  in  detail.  The  relational  view  of  the  database  for , 
PARAFRASE  is  explained,  including  attributes,  records,  and  the 
handling  of  null  values.  Finally,  the  control  cards  needed  to 
store  an  Analyzer  run  in  the  database,  and  the  update  policy  are 
described. 


Key  words  :  relational  database,  query  language,  null  value 
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I.   INTRODUCTION 


The  Query  Processor  is  designed  to  allow  the  user  to  retrieve 
information  from  his  database  by  using  a  high-level  relational  model 
type  query,  though  the  underlying  database  is  not  of  the  relational 
type.  The  Query  Processor  (QP)  reads  in  the  user's  query  and  produces 
a  program,  which  is  then  run  on  the  resident  database  system.  The 
details  of  the  QP  implementation  may  be  found  elsewhere.  This  manual 
describes  how  to  use  the  PARAFRASE  database  and  the  query  language 
(i.e.,  how  to  formulate  legal  queries  and  specify  options).  Section 
IV  describes  the  entry  of  Analyzer  data  into  the  database,  and  other 
information  specific  to  the  Analyzer.  The  reader  is  assumed  to  be 
familiar  with  the  use  of  a  terminal. 
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II.    SYSTEM  COMMANDS 


This  section  contains  a  description  and,  in  certain  cases,  an 
explanation  of  system  commands  that  are  necessary  to  use  the  Query 
Processor.  First  we  will  describe  how  to  log  on  and  off  the  terminal. 
Then  we  will  give  a  description  of  the  commands  that  are  needed  to 
execute  the  Query  Processor.  Finally,  we  will  list  a  few  system 
commands  that  are  closely  related  to  the  usage  of  the  Query  Processor. 


II. 1  ACCESSING  THE  PRIME 

To  log  on  at  the  terminal  type : 
LOGIN  <user  id> 

The  system  responds  by  typing: 
PASSWORD: 


At  this  point  the  user  enters  his  password  (which  does  not  show  up  on 
the  screen)  followed  by  a  carriage  return. 

To  log  off  the  system  when  the  session  is  completed  type: 


LO 


II. 2  EXECUTING  THE  QUERY  PROCESSOR 


The  user  begins  executing  the  Query  Processor  by  typing  in  the 
proper  system  command  as  given  in  Figure  II. 1.  In  response,  the  Query 
Processor  prompts  the  user  for  a  query  and  then  generates  a  query 
application  program  for  the  purpose  of  interrogating  the  database. 
Next  the  Query  Processor  executes  the  application  program  that  prints 
the  result  of  the  query  on  the  terminal  screen  or  in  a  disk  file  as 
specified  in  the  query.  These  steps  are  repeated  until  the  user  types 
a  control-d  followed  by  a  carriage  return  instead  of  a  query. 
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Before  executing  the  application  program,  the  Query  Processor 
asks  the  user  if  the  application  program  is  to  be  saved.  The  user  may 
answer  yes,  and  give  a  name  for  the  file  for  saving  this  application 
program.  If  the  user  saves  the  application  program,  he  can 
subsequently  execute  it  again  without  retyping  the  query  or  using  the 
Query  Processor  (see  explanation  at  the  end  of  this  section). 


Figure  II. 1  contains  a  list  of  commands  to  execute  the  Query 
Processor.  QP:  and  U:  are  not  part  of  the  commands  but  indicate 
what  the  Query  Processor  types  and  what  the  user  types,  respectively. 

U:  SEG  PARA>#QPROC  -A 

QP:  Enter  query  after  "Q?"  prompt  (or  ctrl-d)  to  stop 

QP:  Q? 

U:  <query> 

QP:  Do  you  wish  to  save  the  query  application  program? 

U:  YES 

QP:  Enter  filename  for  application  program: 

U:  <filename  1> 

QP:  COMO  -N 

The  application  program  begins  execution  here. 
See  Figure  II. 2. 

QP:  COMO  -N 

QP:  SEG  PARA>#QPROC   -A 

QP:  Enter   query  after   "Q?"  prompt    (or   ctrl-d)    to  stop 

U:  <ctrl-d> 

QP:  Query  processor   terminated  by  user. 


FIGURE   II. 1 


Figure  II. 2  describes  the  messages  printed  on  the  terminal  during 
the  execution  of  the  application  program.  The  application  program 
asks  for  the  name  of  the  file  that  the  result  of  the  query  should  be 
stored  in  if  FILE  or  FILENOD  was  specified  in  the  query  (see  Section 
III.l).  If  the  file  already  exists,  the  system  asks  if  it  is  OK  to 
modify  the  file.  If  the  user  does  not  want  the  file  to  be  modified 
(i.e., the  user  answered  "NO"),  then  the  application  program  will  ask 
for  a  different  filename  for  the  query  result.  Should  the  user  have 
specified  YES  (i.e., it  is  OK  to  modify  the  file),  the  user  will  have 
the  option  of  overwriting  the  current  file  or  appending  the  new  result 
to  It. 
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Figure  II. 2  gives  a  sequence  of  commands  for  execution  of  the 
application  program.  The  application  program  (AP)  is  initiated  by  the 
Query  Processor (QP). 

AP:  ENTER  FILENAME  FOR  QUERY  RESULT: 

U:  <filename  3> 

AP:  OK  TO  MODIFY  OLD  <filename  3>? 

U:  NO 

AP:  ENTER  FILENAME  FOR  QUERY  RESULT: 

U:  <filename  4> 

AP:  OK  TO  MODIFY  OLD  <filename  4>? 

U:  YES 

AP:  OVERWRITE  OR  APPEND? 

U:  OVERWRITE 


FIGURE  II. 2 


If  the  application  program  is  saved  by  the  user  as  described 
earlier,  subsequent  executions  of  the  application  program  are  much 
faster.  To  initiate  the  applicaton  program  the  user  issues  the  PRIMOS 
command 

SEG  #<filename  1> 

where  <filename  1>  is  the  name  given  by  the  user  for  storage  of  the 
application  program  (see  FIGURE  II. 1).  Execution  of  the  application 
program  then  proceeds  as  stated  in  Figure  II. 2. 


II. 3  MISCELLANEOUS  COMMANDS 


II. 3.1  LISTING  A  FILE 


To  obtain  a  line  printer  listing  of  the  query  result,  which  is 
stored  in  a  file  on  disk,  type: 


SPOOL  <filename>  -FTN. 


II. 3.2  PROGRAM  INTERRUPT 
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When  the  break  key  is  hit  during  execution  of  the  program  type: 


C  ALL 
CLUP 


Failing  to  do  this  immediately  after  the  break  key  was  hit  may  cause 
errors  later  during  execution  of  programs  in  the  user  space,  because 
of  open  files,  and/or  may  corrupt  the  database  if  the  break  key  was 
hit  during  execution  of  a  database  retrieval  program. 
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III.   THE  QUERY  LANGUAGE 


III.l  HOW  TO  SPECIFY  WHERE  YOUR  OUTPUT  GOES 


III. 1.1  THE  DISPLAY  COMMAND 


The  simplest  place  for  the  user  to  send  his  output  is  to  the  same 
terminal  that  he  has  used  to  enter  his  query.  In  this  case  one  begins 
the  query  with  the  command  "DISPLAY"  (yes,  in  capital  letters).  This 
command  is  followed  by  the  output  list  (see  III)  and  tells  the  QP  to 
send  all  the  items  in  the  output  list  to  the  terminal.  No  copy  of 
this  information  is  saved,  or  printed,  so  the  user  must  be  ready  to 
read  the  screen  or  the  user  will  miss  his  answers.  [NOTE:  To  freeze 
the  output  on  the  screen,  hold  down  the  "CTRL"  key  while  typing  "S". 
To  continue  viewing,  hold  down  the  "CTRL"  key  and  type  a  "Q".  The 
output  will  then  continue  to  scroll  up  the  screen.]  For  example,  if  we 
wanted  to  see  "TITLES"  we  would  enter: 


DISPLAY  TITLES 


The  ...   is  the  rest  of  the  query  as  described  in  III  and  IV  below. 


III. 1.2  THE  FILE  COMMAND 


The  user  who  wishes  to  save  a  copy  of  his  output  should  use  the 
"FILE"  command.  This  stores  the  output  in  a  file  which  may  be 
retrieved  at  any  time  to  be  seen  at  the  screen  or  printed.  The 
terminal  screen  will  still  display  the  output  as  was  the  case  above 
(III. 1.1)  except  that  there  may  be  some  changes  in  the  number  of 
columns,  etc.  For  the  same  example  above  we  enter: 


FILE  TITLES 


Again,   the   ....   is  where  the  rest  of  the  query  goes   (see 
III. 2, III. 3). 


III. 1.3  THE  FILENOD  COMMAND 
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The  experienced  user  who  doesn't  need  to  see  his  output  on  the 
screen,  or  who  wishes  to  enter  another  query  with  a  minimum  of  delay, 
may  choose  the  "FILENOD"  command.  As  in  the  FILE  command,  a  copy  of 
the  query  output  is  stored  in  a  file.  In  contrast  to  the  other  output 
commands  however,  FILENOD  will  not  display  any  output  at  the  user's 
terminal.  When  the  query  has  been  processed,  and  the  output  has  been 
put  in  the  file,  the  QP  will  simply  ask  for  the  next  query.  The  user 
must  then  take  care  of  any  printing,  listing,  etc  as  for  any  other 
diskfile.   As  one  might  expect,  our  example  now  becomes: 


FILENOD  TITLES 


[NOTE  :   The  command  FILENOD  means  file  with  no  display.] 
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III. 2.   HOW  TO  SPECIFY  WHAT  OUTPUT  YOU  GET. 


III. 2.1  THE  OUTPUT  LIST 


The  output  list  consists  of  a  list  of  names  and  function  calls. 
The  next  section  discusses  function  calls  in  detail.   (III.  2. 2) 


There  are  two  types  of  names  that  may  be  placed  in  the  output 
list:  attribute  names  and  record  names.   The  user  must  be  familiar 
with  these  names  to  ask  even  a  simple  query.   (If  you  do  not  know 
these  names,  see  your  database  administrator.) 


Attribute  names  are  names  given  to  specific  items  of  information 
in  the  database.  The  name  gives  a  clue  as  to  what  the  information 
means.  For  example,  a  library  database  might  use  AUTHOR  and  TITLE  as 
attribute  names.   Their  meanings  are  obvious.  A  query  beginning: 


DISPLAY  AUTHOR 


would  display  name(s)  of  author(s)  on  the  screen.  Which  names  were 
displayed  would  depend  on  the  ...  part  of  the  query  (see  section 
III. 3). 


Often  there  are  groups  of  logically  related  attribute  names.  In 
the  previous  library  example,  the  attribute  names  AUTHOR,  TITLE,  DATE, 
PUBLISHER  might  all  be  grouped  together  into  the  record  BOOK.  If  the 
user  would  like  information  from  all  the  attributes  in  one  group  he 
could  just  give  the  record  name  for  the  group  as  an  item  in  the  output 
list.  This  will  have  the  same  effect  as  typing  each  of  the  attribute 
names.  For  example,  the  following  queries  are  all  equivalent 
(assuming  the  ...   parts  are  the  same  as  well): 


DISPLAY  AUTHOR,  TITLE,  DATE,  PUBLISHER 


DISPLAY  BOOK  ... 


The  first  query  illustrates  the  last  point  about  output  lists,  that 
each  item  (record  name,  attribute  name,  or  function  call)  should  be 
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delimited  from  the  next  by  at  least  one  blank  and/or  comma.   The 
following  queries  are  all  equivalent  and  legal. 


DISPLAY  AUTHOR  TITLE  DATE 

DISPLAY, AUTHOR, TITLE, DATE 

DISPLAY,  ,  .AUTHOR,,, TITLE   DATE 

,  ,  DISPLAY      AUTHOR, ,, , TITLE  DATE 


Blanks  and  commas  are  used  as  delimiters  throughout  the  query.  They 
are  ignored  except  that  they  show  where  one  name  ends  and  the  next 
name  begins. 


III. 2. 2  BUILT-IN  FUNCTIONS 


There  may  be  cases  in  which  a  user  is  less  interested  in  a  list 
of  values  for  some  attribute  than  in  having  one  aggregate  value  which 
is  a  function  of  the  values  in  the  list.  If  that  function  is 
available  as  a  built-in  function,  the  computation  can  be  performed  by 
placing  a  function  call  in  the  output  list.  At  present,  the  only 
built-in  function  is  the  function  "average  of  one  or  more  attribute  or 
record  names"  (AVG).  If  the  database  contained  student  information, 
such  as  the  attributes  AGE  and  EXAMSCORE,  the  following  query  would 
give  a  list  of  ages  and  a  list  of  scores. 


DISPLAY  EXAMSCORE  AGE 


The  user  could  then  average  each  list  by  hand.   These  two  averages 
could  also  be  obtained  by  entering: 


DISPLAY  AVG( EXAMSCORE,  AGE) 


One  average  will  be  output  for  each  unique  attribute  name  appearing  in 
the  function  call.  [  NOTE:  Only  numerical-type  attributes  can  be 
averaged.  ]  Record  names  may  also  appear  in  the  AVG  function  call,  but 
only  those  records  for  which  special  averaging  routines  have  been 
devised.  [Note:  In  the  PARAFRASE  database,  only  stat  records  can  be 
averaged.]  Duplicate  names  In  AVG  calls   (same  or   succeeding  calls) 
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will  be  Ignored.   The  following  queries  will  have  identical  output: 
DISPLAY  AVG(AGE,EXAMSCORE) 
DISPLAY  AVG(AGE)  AVG(EXAMSCORE) 
DISPLAY  AVG( AGE, AGE, EXAMSCORE)  AVG(EXAMSCORE) 
At  present,  no  other  special  functions  have  been  implemented. 
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III. 3.   HOW  TO  ASK  QUESTIONS  ABOUT  THE  DATABASE 


III. 3.1  THE  "WHERE"  COMMAND 


All  the  preceeding  examples  contained  ...  to  signify  the  rest  of 
the  user's  query.  If  the  user  wants  no  restrictions  on  which  tuples 
are  included  in  the  output,  this  ...  should  be  replaced  by  a 
semicolon  ";".  This  non-restricted  query  is  said  to  contain  a  "null 
where  clause".  (The  WHERE  command  is  not  used  at  all.)  For  example, 
these  two  (legal  and  complete)  queries  contain  null  where  clauses: 


DISPLAY  TITLE; 

FILE  AVG(AGE)  EXAMSCORE  ; 


[NOTE:   If  the  semicolon  is  omitted,  the  QP  will  prompt  the  user   for 
the  next  line  of  his  query  and  just  wait.] 


In  order  to  ask  for  an  output  list  whose  members  have  some 
property  (other  than  that  they  are  in  the  database)  one  must  specify 
the  conditions  that  must  hold  before  a  value  is  output.  As  in  the 
previous  library  example,  suppose  we  want  a  list  of  all  titles  of 
books  written  by  the  author  'KNUTH' .   The  correct  query  would  then  be: 


DISPLAY  TITLE  WHERE  AUTHOR  =  'KNUTH'; 


The  "WHERE"  command  restricts  the  output  to  titles  of  books  whose 
corresponding  AUTHOR  attribute  is  a  string  of  characters  equal  to 
'KNUTH'.  The  semicolon  marks  the  end  of  the  query.  (If  you  forget 
the  semicolon,  the  QP  will  continue  to  wait  for  more  lines  of  the 
query  to  be  entered.  Very  long  queries  should  be  entered  one  line 
(<120  chars)  at  a  time,  each  line  followed  by  a  RETURN  keystroke.) 


III. 3. 2  LOGICAL  EXPRESSIONS  -  THE  "WHERE"  CLAUSE 


The  WHERE  command  and  everything  to  it's  right  (up  to  the 
semicolon)  is  called  the  where  clause.  The  where  clause  is  actually 
one  logical  expression   (boolean  expression).    The   query  processor 
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evaluates  the  expression  for  each  tuple  in  the  database,  and  if  the 
expression  is  true,  the  output  requested  (sections  111.1,111.2  above) 
occurs  for  that  tuple.  In  the  example  above,  each  tuple  with  AUTHOR  = 
'KNUTH'  will  give  a  TRUE  value  to  the  where  clause  and  cause  the 
corresponding  value  of  AUTHOR  to  be  displayed.  If  there  were  3  such 
tuples  in  the  database,  the  query: 


DISPLAY  AUTHOR  WHERE  AUTHOR  =  'KNUTH'; 


would  produce  the  following  list  as  output: 


KNUTH 
KNUTH 
KNUTH 


This  query  is  an  example  of  the  comparison  of  a  string-constant  with  a 
string-valued  attribute.  Boolean-valued  attributes  can  be  compared 
with  either  of  the  two  boolean  constants  TRUE  or  FALSE. 
Numerical-valued  attributes  can  appear  in  arbitrary  arithmetic 
expressions  which  can  then  be  compared  to  other  arithmetic 
expressions.  Unlike  string  and  boolean  attributes,  arithmetic 
comparisons  can  use  all  six  relational  operators,  not  just  the  equal 
operator,   (see  III. 3. 3) 


The  expression  AUTHOR  =  'KNUTH'   could  be  combined  with  other 

simple  expressions  to  produce  larger  logical  expressions.  For  example 

the  following  query  asks  for  titles  of  books  written  by  either  Knuth 
or  Dijkstra: 


DISPLAY  TITLE  WHERE  [AUTHOR  =  'KNUTH'  OR  AUTHOR  =  'DIJKSTRA']; 


Note  the  use  of  the  square  brackets.  This  is  required  around  the 
entire  where  clause  only  when  the  expression  Is  not  in  conjunctive 
form,  that  is,  whenever  the  top-level  operator (s)  are  not  all  AND. 
The  possible  logical  (boolean)  operators  are  AND,  OR,  and  NOT.  Normal 
precedence  rules  apply.  (NOT  >  AND  >  OR).  Square  brackets  must  be 
used  to  override  precedence. 


Suppose  we  want  titles  of  all  books  written 
below  20.   The  following  query  expresses  this: 


by  Knuth  and  cost 


DISPLAY  TITLE  WHERE  AUTHOR  =  'KNUTH'  AND  COST  <  20; 


PARAFRASE  DATABASE   USER'S  MANUAL 


page   14 


Square  brackets  were  not  required  here  because  only  AND  appears  at  the 

top  level  of  the  logical  expression  (WHERE  AND  ;).   Extra 

square  brackets  can  appear  around  any  logical  subexpression  without 
changing  the  result  (as  long  as  precedence  is  not  changed) : 


DISPLAY  TITLE  WHERE  [[  AUTHOR  -  'KNUTH']  AND  [  COST  <  20  ]]; 


Anything  appearing  in  square  brackets  must  be  able  to  return  a  true  or 
a  false  value,  and  each  pair  of  square  brackets  must  contain  at  least 
one  relational  operator: 


DISPLAY  TITLE  WHERE  [  AUTHOR  =  'KNUTH'  ] 


(<-  correct) 


DISPLAY  TITLE  WHERE  [AUTHOR]  =  'KNUTH'  ; 


(<-  incorrect) 


Some  of  these  restrictions  are  only  temporary, 
section  III. 3. 4. 


For  more  details,   see 


III. 3. 3  ARITHMETIC  EXPRESSIONS 


Any  numerical-valued  attribute  or  constant  can  appear  in  an 
arithmetic  expression.  An  example  was  given  above  by  the  query 
fragment  COST  <  20.  The  arithmetic  expression  COST  is  compared  to  the 
arithmetic  expression  20  to  give  a  logical  result  (true  or  false). 
Much  more  complicated  expressions  are  allowed  using  the  operators 
{+,-,*>/,**}.   For  example: 


DISPLAY  TITLE  WHERE  COST  <  126  -  (15  +  6)/7; 


Any  number  of  arithmetic-valued  attributes  or  constants  may   be 
combined  into  one  arithmetic  expression: 


DISPLAY  TITLE  WHERE  COST  <  (INCOME  -  TAX)/ 12  +  BONUS  ; 


Note  the  use  of  parenthesis  to  override  the  precedence  of  the 
arithmetic  operator  /  over  the  arithmetic  operator  -  .  Every 
expression  appearing  Inside  matching  pairs  of  parenthesis  must  return 
an  arithmetic  value.   [NOTE:   A  string  constant,  a   logical  constant, 
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or  an  attribute  may  also  be  surrounded  by  matching  parentheses.   This 
will  not  cause  an  error  but  is  always  unnecessary.   See  III. 3. 4] 


Two  arithmetic  expressions  can  be  combined   into  a   logical 
expression  by  any  of  the  operators  {<,>,=,<=,>=,<>}  which  are: 


< 

(less  than) 

> 

(greater  than) 

= 

(equal) 

<= 

(less  than  or  equal  to) 

>= 

(greater  than  or  equal  to) 

<> 

(not  equal  to) 

None  of  these  operators  can  ever  appear  inside  a  matching  pair  of 
parentheses  since  they  return  logical  values  (boolean) !  The  following 
query  illustrates  the  legal  use  of  brackets  and  parentheses: 


DISPLAY  TITLE  WHERE  [[((COST))  >=  ( . 15)*(SALARY-7 ) ] ] ; 


III. 3. 4  SPECIAL  RESTRICTIONS 


In  addition  to  the  square-bracket  restriction  mentioned   in 
III. 3. 2,  the  following  also  apply.   (Some  are  temporary.) 


1.  An  attribute  that  has  a  character  string  as  a  value  can 
be  compared  only  to  string  constants  with  length  less 
than  or  equal  to  the  maximum  possible  string-length  of 
that  attribute. 

2.  Logical  attributes  can  be  compared  ONLY  to  logical 
constants. 

3.  Null  values  for  numerical  attributes  are  stored  as  large 
negative  numbers,  and  thus  a  selection  such  as  COST  <  20 
is  true  even  if  cost  is  null.   The  user  must  explicitly 
exclude  nulls  if  this  is  desired. 

(i.e.   COST  <  20  will  be  true  for  all  null  values 
stored  for  COST.   Explicit  exclusion  of  nulls  would  be 
realized  by  COST  <  20  AND  COST  >=  0.  ) 
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III. 4.   ERROR  MESSAGES 


There  are  three  major  types  of  error  messages  produced  during 
parsing  of  a  user's  query:  recoverable  errors  (warnings), 
unrecoverable  errors,  and  internal  errors. 


Internal  errors  are  always  marked  INTERNAL  ERROR  and  occur  on 
overflow  of  internal  tables,  on  discovery  of  invalid  data  in  schema  or 
name  tables,  etc.  They  indicate  that  either  the  query  was  too  long  or 
there  is  a  bug  in  the  system.  Internal  errors  should  be  reported  to 
the  database  administrator  at  once. 


Recoverable  errors  or  warnings  are  given  for  such  things  as 
appending  extra  characters  on  six-character  attribute  names  (six  is 
the  maximum  length  for  attribute  names),  typing  unrecognized 
characters,  etc.  The  user  should  be  sure  that  he  understands  the 
reason  for  the  warning. 


Unrecoverable  errors  include  unrecognizable  names,  incompatable 
types  in  the  same  expression,  operand-operator  incompatablity  (such  as 
strings  and  the  <  operator),  etc.  These  errors  cause  rejection  of  the 
entire  query  and  reinitialization  of  all  internal  structures.  The 
query  processor  will  then  prompt  for  a  new  or  revised  query.  (If  you 
do  not  understand  why  your  query  was  rejected,  see  your  database 
administrator. ) 
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IV.   THE  PARAFRASE  DATABASE 


IV. 1  ANALYZER  CONTROL  CARDS 


To  store  an  Analyzer  run  in  the  database,  the  following  control 
cards  must  be  used: 


/*ID  PUNCH=CYBER,CARDS=10000,NAME='file(user  number)' 


//  EXEC  ANALYZE, RUN=STATISTICS 


//INPUT  DD  * 


&NAME=user's  last  name 
&TYPE=machine  type  name 
&DEFINITION=definition  number 
&DESCRIPTION=machine  type  description 


After  the  run  is  finished,  type  the  following: 
GET, file/UN=user number 

APPEND, DBASE , f ile/UN=3PARAFR 
Note:  In  each  of  the  first  three  "&-cards",  blanks  are  not  allowed. 
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IV. 2.   MACHINE  TYPES 


A  machine  type  is  an  entity  used  to  describe  a  type  of  computer 
architecture.  It  is  uniquely  defined  by  a  user  name,  a  machine  type 
name,  a  definition  number  and  an  id  number.  Associated  with  a  machine 
type  are  the  names  and  versions  of  the  passes  and  an  80-character 
machine  type  description.  The  user  name,  machine  type  name, 
definition  number  and  machine  type  description  are  supplied  by  the 
user  via  control  cards  submitted  in  an  Analyzer  run.  An  id  number  is 
assigned  by  the  update  program  before  insertion  into  the  database. 


The  assignment  of  an  id  number  is  done  to  differentiate  machine 
types  that  have  the  same  user  name,  machine  type  name  and  definition 
number,  but  have  a  different  set  of  passes  and/or  versions.  The  id 
number  assigned  to  the  first  unique  set  of  passes/versions  is  1,  the 
second  unique  set  of  passes/versions  is  2,  etc. ,  for  a  given  user 
name,  machine  type  name  and  definition  number. 


Example  of  a  hierarchy  of  machine  types  in  the  database: 


— 

Jone 

:S 

Smith 

! 

! 

! 

! 

! 

! 

SIME 

— 

MIME 

— S IME— 

! 

! 

! 

j 

!               ! 

! 

! 

! 

! 

!               ! 

—  1  — 

2 

5 

8- 

-- 

— 

-2—          3 

!        ! 

!         ! 

! 

! 

! 
! 

!    !    ! 
!    !    ! 

! 
! 

! 

! 

!           ! 
!           ! 

1        2 

1 

1 

1   2   3 

4 

1 

2          1 

< user  names 


< machine  type  names 


< definition  numbers 


< id  numbers 


In  a  query,  one  may  use  whatever  portion  of  the  trees  that  he 
desires.  By  specifying  a  user  name,  a  machine  type  name,  a  definition 
number  and  an  id  number,  a  single  machine  type  is  specified  (i.e.,  one 
of  the  leaves).  By  not  specifying  the  id  number,  one  will  get  all 
machine  types  having  the  specified  user  name,  machine  type  name  and 
definition  number,  regardless  of  id  number.  (This  may  be  desirable  if 
differences  in  passes/versions  have  not  altered  the  semantics  of  the 
machine  type.) 


Similarly,  one  can  specify  only  a  user  name  and  a  machine   type 


name,  if   the  definition  is  not  of  concern, 
types  he's  defined  to  be  used  in  a  query,  he 


If  one  wants  all  machine 
need  only   specify  his 
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user  name.  If  one  wants  everybody's  machine  types  to  be  taken  into 
consideration  in  a  query,  a  value  for  none  of  the  four  attributes  is 
specified. 


IV. 3.   NULL  VALUES  IN  THE  OUTPUT 


Any  field  containing  all  asterisks  in  the  output  indicates  that 
no  value  was  stored  for  this  attribute,  i.e.  the  attribute  contains  a 
"null  value."  In  the  output  of  the  average  of  an  attribute  or  a 
record,  an  occurrence  of  all  asterisks  in  a  field  indicates  that  each 
tuple  retrieved  contained  a  null  value  for  this  attribute.  Averages 
are  computed  on  non-null  values  only  -  null  values  are  never  averaged 
in  with  non-null  values. 


IV. 4.   UPDATE  POLICY 


An  Analyzer  run  consists  of  a  machine  type,  a  program,  a  set  of 
options,  the  date  and  time  of  the  run,  and  various  statistics 
computed.  A  machine  type,  a  program  and  a  set  of  options  uniquely 
determine  an  Analyzer  run.  Therefore,  before  an  Analyzer  run  is 
inserted  into  the  database,  a  check  is  made  to  determine  whether  there 
already  exists  a  run  in  the  database  having  the  same  machine  type, 
program  and  set  of  options.  If  so,  this  run  is  deleted  before  the  new 
run  is  inserted. 
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IV. 5.   ATTRIBUTES  IN  THE  PARAFRASE  DATABASE 


1.  Machine  Types 


record  name  =  MTYPES 


1.  name  of  user  (character (10) ) 

2.  name  of  machine  type  (character (10)) 

3.  definition  number  of  machine  type  (integer) 

4.  id  number  of  machine  type  (integer) 

5.  description  of  machine  type  (character (80) ) 

6.  pass  names  (character(8)  for  each  pass) 

7.  pass  versions  (character (10)  for  each  pass) 


USER 

MT 

DEF 

ID 

INFO 

****** 

****** 


(******  means  attributes  may  not  be  specified  explicitly  in  a  query) 


2.    Programs 


record   name   =  PROGS 


1.  package  name  (character(8)) 

2.  program  name  (character (8 ) ) 


PKG 
PRG 


3.  Options 


record  name  =  OPTNS 


1.  no-side-effects  (boolean) 

2.  pipeline-dd  (boolean) 

3.  vector-reg  (boolean) 

4.  do-bound  (integer) 

5.  reg-length  (integer) 


NOSDEF 
PIPEDD 
VECTRG 
DOBND 

RE GLEN 


4.  Run-date 


record  name  =  RUNS 


1.  date  of  run  (character (10) ) 

2.  time  of  run  (character(8) ) 


DATE 
TIME 


5.  Compiler  Statistics  1    (all  integer) 


record  name  ■  CS1 


1.  Source  Statistics 

1.  cards 

2.  statements 

3.  do-loops 

4.  maximum  do-loop  nest 


CARDS 
STMTS 
LOOPS 
NEST 
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2.  If-Loop  Translation 

1.  if-loops  transformed  to  while-loops 

2.  if-loops  transformed  to  do-loops 


IFTWL 
IFLOOP 


3.  Loop  Freezing 

1.  loops  manually  frozen  HNDFRZ 

2.  loops  auto-frozen  with  calls  CALFRZ 

3.  loops  auto-frozen  with  if-loops  IFLFRZ 

4.  loops  auto-frozen  with  io-statements  IOFRZ 

5.  loops  auto-frozen  with  loops  exits  EXTFRZ 

6.  loops  auto-frozen  with  branches  around  loops   SKPFRZ 

7.  loops  auto-frozen  for  other  reasons  OTHFRZ 

8.  total  number  of  loops  auto-frozen  AUTFRZ 

9.  total  number  of  loops  frozen  FROZEN 


4.  Scalar  Renaming 

1.  variables  renamed 

2.  new  variables  created 


RENAME 
NEWNAM 


5.  Scalar  Invariant  Code  Floating 
1.  expressions  floated 


EXPFLT 


6.  Induction  Variables 

1.  total  number  of  induction  variables  INDUCT 

2.  induction  variables  of  nest  i,  1-1,2,3  IVNLi 

3.  induction  variables  of  nest  >  3  IVNLG3 

4.  induction  variables  with  i  increments,  1-1,2,3  IViINC 

5.  induction  variables  with  >  3  increments  IVG3IN 

7.  Scalar  Expansion 

1.  total  number  of  scalar s  expanded  EXP 

2.  scalars  expanded  by  i  dimensions,  i-l,2,...,7  DIMi 

3.  scalars  expanded  by  >  7  dimensions  DIMG7 

8.  Dead  Code  Elimination 

1.  statements  eliminated  by  general  dead  code  DEAD 

2.  statements  eliminated  by  scalar  dead  code  DEADSC 


9.  Loop  Collapsing 
1.  loops  collapsed 


COLLAP 


10.  Loop  Fusion 
1.  loops  fused 


FUSE 


11.  Loop  Interchanging 

1.  loops  interchanged  for  strip  mining 

2.  vector  loops  interchanged 

3.  vector  loops  found 


INTSM 

INTVEC 

VLFND 
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6.  Compiler  Statistics  2  (all  integer) 


record  name  =  CS2 


1.  Statement  Function  Expansion 

1.  statement  functions 

2.  statement  function  substitutions 


STMTFN 
SUBST 


2.  Vector  FORTRAN  Translation 

1.  vector  statements  found 

2.  loops  added 


VSTFND 
LPSADD 


3.  Loop  Blocking 
1.  block  size 


LBLKSZ 


A.  Compiler  Temporary  Reusing 

1.  variables  deleted 

2.  temporaries  left 


VARDEL 
TMPLFT 


5.  Compiler  Temporary  Shrinking 
1.  compiler  arrays  shrunk 


ARRSHR 


6.  Define  Variables 

1.  variables  defined 


VARDEF 


7.  Wrap  Around  Variables 

1.  wrap  around  variables 

2.  loops 


WAV 
WAVLPS 


8.  Carry  Around  Variables 

1.  carry  around  variables 

2.  loops 


CAV 
CAVLPS 


9.  Scalar  Forward  Substitution 

1.  scalar s  forward  substituted 

2.  substitutions  made 


FRWRD 
SFSUBS 


10.  LHS  Code  Floating 

1.  statements  floated 

2.  loops  deleted 


LHSFLT 
LHSLDL 


11.  RHS  Code  Floating 

1.  statements  floated 

2.  loops  deleted 


RHSFLT 
RHSLDL 


12.  "Code"  Generation 

1.  vector  statements 

2.  scalar  statements 

3.  recurrences 

4.  vector-recurrences 

5.  io-stateraents 

6.  conditionals 

7.  serial  loops 


VEC 

SCAL 

REC 

VECREC 

10 

CONDS 

SERLLP 
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8.  calls 


CALLS 


7.  Recurrence  and  If-Removal  Statistics 


(all  integer) 

record  name  =  RECURS 


1.  Recurrence  Statistics 

1.  recurrences  translated 

2.  vector  sums 

1.  total  number  of  vector  sums 

2.  number  of  full  vector  sums 

3.  number  of  modified  vector  sums 

4.  number  of  anti-modif led  vector  sums 

3.  vector  products 

4.  dot  products 

5.  R<n,l> 

6.  R<n,l>  CC 

7.  R<n,l>  RT 

8.  R<n,l>  CC  RT 

9.  vector  MAX 

10.  vector  MIN 

11.  vector  ALL 

12.  vector  ANY 


2.  IF -Removal 

1.  IFs  changed  to  MAX  or  MIN 

2.  IFs  in  loops  removed 

3.  modes  used 

4.  exit  modes 

5.  mode  modifications 

6.  loop  invariant  modes  floated 

7.  leading  zeroes  found  (exit-IFs) 


RECS 

VSUMT 

VSUMF 

VSUMM 

VSUMA 
VPRODx 
DPRODx 
RNlx 
RNlCCx 
RNlRTx 
RCCRTx 
VMAXx 
VMINx 
VALLx 
VANYx 
where  x=T,F,M,A 


MAXMIN 

IFS 

MODES 

EXTMDS 

MODEMD 

LIMFLT 

LZFND 


8.  Loop  Structure  (all  integer)  record  name  =  LOOPST 

1.  Vectorized  Loop  Structure  -  10  X  10  table      VLij 

2.  Inverted  Vectorized  Loop  Structure  -  10  X  10  table 

INVLij 
where  i, j=S,0, 1, 2, 3, 4, 5, 6, 7, G 
(S  means  "sum"  and  G  means  ">7") 


9.  Sime  Simulation  Statistics  1   (all  real) 
1.  T(p),  p=2**n,  n=0,l,...,15 


record  name  =  SIME 
Tp 
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2.  0(p),  p=2**n,  n=0,l,...,15 


Op 


3.  Se 
1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 
10. 
11. 
12. 
13. 


rial  Statistics 
serial  Tl 

S(p),  p=2**n,  n-0,1 
E(p),  p=2**n,  n=0,l 
R(p),  p=2**n,  n=0,l 
U(p),  p=2**n,  n=0,l, 
Q(p),  p=2**n,  n=0,l,. 
Alpha(p),  p=2**n,  n=0 
Max  S(p) 
Max  E(p) 
Max  R(p) 
Max  U(p) 
Max  Q(p) 
Max  Alpha(p) 


4.  Parallel  Statistics 

1.  parallel  Tl 

2.  S(p),  p=2**n,  n=0,l 

3.  E(p),  p=2**n,  n=0,l 

4.  R(p),  p=2**n,  n=0,l 

5.  U(p),  p=2**n,  n=0,l 

6.  Q(p),  p=2**n,  n=0,l 

7.  Alpha(p),  p=2**n,  n=0 

8.  Max  S(p) 

9.  Max  E(p) 

10.  Max  R(p) 

11.  Max  U(p) 

12.  Max  Q(p) 

13.  Max  Alpha(p) 

14.  P  with  maximum  Qp 


.,15 
.,15 
.,15 
.,15 
.,15 
1,  —  ,15 


,15 
,15 
,15 
,15 
,15 
1,  —  ,15 


SERLT1 

Sp 

Ep 

Rp 

Up 

QP 

Ap 

MAXS 

MAXE 

MAXR 

MAXU 

MAXQ 

MAXA 


PARLT1 

SPp 

EPp 

RPp 

UPp 

QPp 

APp 

MAXSP 

MAXEP 

MAXRP 

MAXUP 

MAXQP 

MAXAP 

PMAXQ 


10.  Sime  Simulation  Statistics  2   (all  real) 


record  name  =  SIMEH 


1.  Utilization^, 1),  p=2**n,  n=0, 1, . . . ,  15,  l=2**m,  m=0,l,...,n 

UTnm 

2.  Vector-Length(p,l),  p=2**n,  n=0, 1, . . . , 15,  l=2**m,  m=0,l,...,n 

Vnm 


3.  Sigma(p),  p=2**n,  n=0,l,...,15 


Zp 


11.  Vector  Register  Statistics   (all  integer)     record  name  =■  VRSTAT 
1.  Vector  Register  Assignment 
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1.  vector  registers  allowed 

2.  mode  registers  allowed 

3.  register  length 

4.  vector  registers  used 

5.  mode  registers  used 


VR 

MR 

REGL 

VRUSED 

MRUSED 


2.  Register  Chaining 

1.  register  chains  found 

2.  statements  deleted 


CHAIN 
DEADCH 


3.  Pipeline  Filling 

1.  original  pipelining  factor 

2.  final  pipelining  factor 

3.  pipelining  improvement 


ORIGPF 
FINLPF 
PIPFIL 


12.  Memory  Hierarchy  Statistics  (all  integer)    record  name  =  MEMHST 


1.  Clustering 

1.  Before  Transformation 

1.  name  partitions 

2.  minimum  number  of  s 

3.  maximum  number  of 

4.  average  number  of  s 

2.  After  Transformation 

1.  name  partitions 

2.  minimum  number  of 

3.  maximum  number  of 

4.  average  number  of 

2.  Loop  Fusion 

1.  name  partitions  fused 

2.  name  partitions 


3.  Block  Indexing 

1.  name  partitions  block  indexed 


BNP 

statements 

per 

np 

BMINST 

statements 

per 

np 

BMAXST 

statements 

per 

np 

BAVGST 
ANP 

statements 

per 

np 

AMINST 

statements 

per 

np 

AMAXST 

statements 

per 

np 

AAVGST 

NPFUSE 

NP 

.  indexed 

BLKNDX 

4.  Non-Basic  to  Basic 

1.  non-basic  name  partitions  NONBAS 

2.  non-basic  name  partitions  changed  to  basic  name  partitions 

BASIC 


5.  Scalar  Transformations 

1.  scalars  forward  substituted 

2.  scalars  expanded 


SCLFRW 
SCLEXP 


13.  Trace  Generator  Statistics 


record  name  =  TRCGEN 
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1.  Data  Input  Statistics 

1.  page  size  (integer)  PGSL&E 

2.  name  of  input  variable  i,  i=l,...,10  (character (7); 

****** 

3.  value  of  input  variable  1,  i=l,...,10  (character (10)) 

****** 

(******  means  attributes  may  not  be  specified  explicitly  in  a  query) 

2.  Simulation  Statistics  For  Each  Memory  Allocation  (all  integer) 

1.  page  fault  i,  i-1,2, . . . , 100  PGFi 

2.  space  time  cost  i,  1*1,2, ...,100  STCi 

3.  Overview  Statistics  (all  integer) 

1.  array  references  ARRREF 

2.  virtual  pages  VIRPGS 

3.  minimum  page  fault  MINFLT 

4.  memory  allocation  of  minimum  page  fault  MEMFLT 

5.  minimum  space  time  cost  MINSTC 

6.  memory  allocation  of  minimum  space  time  cost   MEMSTC 

7.  declared  arrays  NARRAY 
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